Estimating Grammar Parameters Using Bounded Memory
نویسندگان
چکیده
Estimating the parameters of stochastic context-free grammars (SCFGs) from data is an important, well-studied problem. Almost without exception, existing approaches make repeated passes over the training data. The memory requirements of such algorithms are illsuited for embedded agents exposed to large amounts of training data over long periods of time. We present a novel algorithm, called HOLA, for estimating the parameters of SCFGs that computes summary statistics for each string as it is observed and then discards the string. The memory used by HOLA is bounded by the size of the grammar, not by the amount of training data. Empirical results show that HOLA performs as well as the Inside-Outside algorithm on a variety of standard problems, despite the fact that it has access to much less information.
منابع مشابه
Estimating Grammar Parameters using Bounded Memory
Estimating the parameters of stochastic context-free grammars (SCFGs) from data (i.e., strings) is an important, well-studied problem. Almost without exception, existing approaches make repeated passes over the training data. The memory requirements of such algorithms are ill-suited for embedded agents exposed to large amounts of training data over long periods of time. We present a novel algor...
متن کاملParallel Parsing of Languages Generated by Ambiguous Bounded Context Grammars
Using the CRCW PRAM model, we describe a language recognition algorithm for an arbitrary grammar in the class of BCPP grammars 9]. (BCPP grammars, which admit ambiguity, are a generalization of both the NTS grammars 14] and Floyd's bounded context (BC) grammars 4].) Using n processors, the algorithm runs in time O(h log n) (O(h) in the case of an unambiguous grammar), where n is the length of t...
متن کاملIncremental Parsing in Bounded Memory
This tutorial will describe the use of a factored probabilistic sequence model for parsing speech and text using a bounded store of three to four incomplete constituents over time, in line with recent estimates of human shortterm working memory capacity. This formulation uses a grammar transform to minimize memory usage during parsing. Incremental operations on incomplete constituents in this t...
متن کاملMemory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input
This paper presents a new memory-bounded left-corner parsing model for unsupervised raw-text syntax induction, using unsupervised hierarchical hidden Markov models (UHHMM). We deploy this algorithm to shed light on the extent to which human language learners can discover hierarchical syntax through distributional statistics alone, by modeling two widely-accepted features of human language acqui...
متن کاملA Bounded Rationality Model of Information Search and Choice in Preference Measurement
It is becoming increasingly easier for researchers and practitioners to collect eye-tracking data during online preference measurement tasks. The authors develop a dynamic discrete choice model of information search and choice under bounded rationality, which they calibrate using a combination of eye-tracking and choice data. Their model extends Gabaix et al.’s (2006) directed cognition model b...
متن کامل